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Note 

(List of cited literature) 

1. Japanese Unexamined Patent Application Publication H10- 177498 

2. Japanese Unexamined Patent Application Publication S63- 175937 

(Claims) 1,2, and 7 
(Cited literature) 1 and 2 
(Remarks) 

Cited Literature 1 describes a fault tolerant computer system comprising a fault detection unit (voter) 
which monitors for identity of the operation of processors of a plurality of processing sets, each of which 
comprises a processor and memory and operates in sync under a common clock; a first recording 
mechanism which records memory update events subsequent to the detection of the operation of any 
processing set being out of sync with the other processing sets; and a reintegration mechanism which 
accesses said first recording mechanism, identifies the memory write location, and copies the 
corresponding memory area from the memory of operating processing sets to the memory of the out-of- 
sync processing set. 

The invention according to Claim 1 of the present application has an arrangement which instructs all 
modules to stop when a mismatch is detected in the operation of the processors of a plurality of modules, 
which differs from the invention described in Cited Literature 1, which does not have such an arrangement. 
However, Cited Literature 2 describes a fault recovery scheme in a fault tolerant computer whereby, upon 
detection of an error, the operation of all processors is temporarily stopped, the internal state of a processor 
which had the error is made identical to the internal state of a normal processor and the operation of all 
processors is restarted. Applying this to the invention described in Cited Literature 1, a person skilled in the 
art could easily conceive of the arrangement of the inventions according to Claims 1, 2, and 7 of the present 
application. 

Therefore, it is found that the inventions according to Claims 1, 2, and 7 of the present application 
could have been easily invented by a person skilled in the art based on the inventions described in Cited 
Literature 1 and 2. 

(Claims) 3 

(Cited literature) 1 and 2 
(Remarks) 

While the data copying method of the data copy means described in Cited Literature 1 is not explicitly 
described, the method of copying data by direct memory access is well known art, and employing this well 
knpwrLartjn_Cited Literature-l-isamatter which could be easily-accomplished by-a person skilled inthe art: 

(Claims) 4 and 5 

(Cited literature) 1 and 2 

(Remarks) 

According to Figure 3, the fault detection unit (voter) of the invention described in Cited Literature 1 
detects faults based on data on an internal bus between processors and memory, and is thereby found to 
monitor memory access commands and addresses issues by the processor. 

(Claim 6) 

(Cited literature) 1 and 2 
(Remarks) 

It is well known that in a system comprising a plurality of processors with a cache, when the data of 
any given node is rewritten, it is necessary to perform cache flush for the other nodes. Therefore, it is found 
that the invention according to Claim 6 of the present application could have been easily invented by a 



person skilled in the art based on the inventions described in Cited Literature 1 and 2 and on the well- 
known art. 

If any reasons for rejection are newly discovered, a Notification of Reasons for Rejection will be 
issued. 
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£2te»an8k. inKWh^iiEsiaHi 

c w 2 ] Btriem i idis®*t#. aaw>.x * y ■ i 
-yw-wi-wucwtixv nut frfem i immm 

ifffix >hv izm&frznz, n- h k fcifnasa 

m i (ce»o^ * y «a^f- a . 

[n*a 3 ] friem i EfttRtttf. wres i eiUHf 

[ w^js 4 j fries 2 fe§i®«^ m^f <y y %m 

«fi^ty!8W-f^yh«E«*lt 

»t4it*»fski-i, imi(cBn®.x%ytti 

[M*a5 ] flieS 2 m&MktytXixftiii La* 7 7 
T**tMfc ft*3l4t;ie«cD>' ; &yf 

[w*«6 3 fries 1 mmmmtf&jtiftiii u< 

* 5 memo.* je y vi S'* r a . 

[«OTI7 ] ffieJfeAixJfcaj LA'7 7 r sSffT H 

• Wtjfeffl-fc' *r*7 r *»-^fffl * $ ft4 y slff 7*-H -V X— 
fcatf-^-swfc&KU frier Hl^X • rn-y 

mmm^izmtx. MC^-iw»«e» 

HH5WI8] Miem2|£^jSlf*<i&af^fS5r-ttf^ 

a. 

ntfries 1 iaifttJ^j^A-t **> 
-gjSo^^ysiff^oh^feii^^ 

ixm±zti& z t zmt ■** , n$* 1 tcieijcr.^ 
^yfaxx^fju. 

[W^jsio] fries 1 ieu«iflt# v ytv* 



m^9t l zim<r>^)<miyx ; rM,. 

imm 1 1 3 1Hr!g2£0ttMtf. ±* * 'J 
ftSftSTFl^x^^rfcilM^y • T^-feX 
^££*f.rk£=l#$tk^-4, BSflUdBRO^tyB 

mm 1 2 3 Btrte^^e vmnmiw. fries 1 *j 

J:ira2li2IMIfrMra^.x*y -^-x'S-WSte 
J: 3 KIWI«ri8r*&£k Mil 

fcfBrtt^tyfa^xrA. 

•vht. mmt-rtu ^yhmmi. PUKmut* 

h mwHi-ttimmt * at* isx^x-h -> 
x.m* <r>ym± -v v mm 1 tct e®co^ ^ y 

tit** ^•y^xT-ynj^x-mm-^mLmmm 
*i&afrimmtrMtitmb ^^xtax-^ 

X. m«n9m±vhtf*t:. iSWLZtiXX&JWiL 

w<>b *m*t& - 1 &m 1 ief»sfflk , 

< k i fR & *ufeR*>.X * y h SrlBH-f 4 

«»t<^P««i3t»2a2««ffllk, BBm^ 

isiM(i-rixfi-f^S!iLTfne^iie^iii 
immMxm^iitcm^fj^. mmtmsm 

■fc -y h+t'S^-TS^* y Utt^tMk t*t» i k £ 

[ft m 1 5 3 friEm 1 imnmt\ m.^ t y • 
i-is<7)*ti?ti(izm-hxyt y k . Mem 1 cwi 

-^xy h y (c«a»S<i4 y b ZffiltzR AM 
- 1 *t»Xt 'J «as -«y h t* -4 -£ k Sr^ja-k--f-4-|»- 
*«14tcte®«07*-;Ph • hu^yh • nyhv- 

[iii&s 1 6 3 fries 1 mmmtf. fitem 1 £itR 

W^Jll 5KE«f7)7*- • M^7>h • ayti 
— ^ • yXf A. 

[M*« 1 7 3 «rejR2Citmi«(, mmittti-i ^ 
> htc^v>Tfnes 1 iema«^«»ftf4^^*A 

-■fSOtc-HJ-^rfRiT, Saa^tyseff^'Ohw 

ie^5:Jt^-r4 i k zmbtz . ii^ji 1 4 tzgmo 

[IS^a 1 8 3 W£jB2IE«IM(W*jfe^ftffiU< v 
7r$r^ClkSr^ak-t4, Ifi^JSl 7iziffi<D7* 
->uh ■ hi^yyh ■ ^yt'jL-y ■ yxfA. 
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imm 1 9 ] Mem i smsmmssu^iMi i 

>U7T<r)ibj)l,z&mixX^lZb$:mbtZ>. ffi 
#«18fcB«e>7:l— ;H» • bVyyb- 3yfa- 

C liftRJS 2 0 3 M E5t AftStaj LA -y 7 y H*i T 

IE3feA<x5feajLA'.y7r<DirEaj^fe:««$*iT. Me 
5feAix5tajL^'-y 7r*><oaj*^ix&>< * 'JHSrT FW- 

mm i mmt&tz t nmt-rh. mm 1 9 

C£Ht<07*-/Uh • M/7>h • ay^jx-S ■ i/X 
fA. 

[n$« 2 1 ] Mem 2 iaHiRiftfifrajitt%£*& 

«It$r!Rf®i:-r&. fflggl 4fcie»?)7*-/Ph • h 
Vy >h ■ ay^jL—? ■ uxy-J*. 

mm 2 2 3 Mem 2 nmmtf. nmt-nu * 
yhKm^xmm 1 tsmmiwrnnrrim^^ 

ifom^tzfmLxm±zix&zti<m$Ltt&. m 

ai4(ciatt*)7<— /n* ■ hUy>h ■ ay*?*-? 

mm 2 3 3 Mem 1 iei§&fii* j v 7 h 7 x 7**£ 
ttottiztimktti. rnmiA\,zuw)yt- 

)Vb • hUyyh ■ ayv*-? • i/Xf-l*. 

mm 2 5 3 Me* * y uttetftfiw. Mem 1 

iU r ^21£Hi»ffi-C1tS'J$iX7t^ : e 'J • 
-4 teffifi^T-*-— /H — vu-y-y K -v- 
[11^2 6 3 Oy^XT-y7^TftfM-*3oco 

ff*Sl4tcie«c7)7^-/Uh • hl^^h • ny 
ti-? • yXfA, 

[ft«JB2 7 3 ffiSEBMHtt-rifittasS**. SOcO 2-5 

^oA^ffi&LTMePOT 

{i-fflJOa-fe -y hfcitmOOjaS-fe-y hOTM 
g|SS£L, y§2 ! ^<yF?)77F'7x7 • D 

htxT ■ ortmtx. mmmn-ftim-t-y b 



tizoiz&mtLizkzimtth. n*s2 6t 

fef!<07*-^h ■ bUyyb • aytx-^ • yxf 
A. 

tiut,%tz$ts7*- frh ■ buyyb ■ ayv*.-? 
■ y*Tj*<v%m-t-v b^m^i-^umx^x . m 
t>tLtzmmizhtz ^x^^uwM^<y b <r>-mmz 
tm-t&mt. mmmm^zmLx. mum 

en*** uffiiifcct vattxt yteiiTiss'i^^sgi! 
tts. y*-)Vb ■ bvyyb ■ ayv*-? ■ z/Xt 

AOJ&SHr -y h * 

[a^)S2 9 3 w2mt4*vYimm*rti4«*v 
bxtihzbimLtn. mm2 8tz§£m<oirm. 
i mm 3 o 3 imimtfx * y taoL - . y h oien 

^tyco^-^cittT^t^ii, 3-h*< n Mem 

nz. *&'<- : J(7)x.ybvi l zm&z-zti2>ztz<mt 

■r*. iw»B2 8fcie««>m 

[ff^«3 1 3 Me^ieiis-vSiiftt^^ra^^A- 

—*-6«lc+»ifi**T, fcJffflMtyjBIK'O'hO 
«»*«»1-6BW**t». «*J!2 8lCfe«JO*a. 
[ W*JS 3 2 3 IWB-i»E»* t jfeWi*UJ < y 7 r 

[fWSR 3 3 1 ^feS3tofcfcOlBil«1SS\ ME5fe 
A*i5feajtyN*-y7TOaj*{=^tSSPt$-^tf, If* 
JS3 2£fe8«9:frffi. 

[ft*JS3 4 3 MB5fcA*l5fca} La> 7 r tSr^R* 

•yyT<0ttimrb^x ■ ra-nz&telX. Me* 
Aitifeai La' . y 7 r *^ ifefj § * y ^MT VVX 

K^Btc^ i: # tc , Me^- vEftt-mBsmm 
o-ut iximtzfmtzisti, is*js3 3tieK 
com. 

im&3 6] «IEIWHi^tJESLTBljl:5*l4» 

y x b & &m-&®m*itti , 3 6 tciea^ 
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ft. 

mmf8.-tz>fm*$ts. mxm i{=ie«o*ft. 
3itcie«co*ft. 

imm4 1 ] iBraii-f i-iKui-fe . y h comsMzm® i 
t , wima-fti^m-t vhisxim*) comm-t -y h ^ 

o-hi^mm^-timt. ^xmivyhv^T 
©t^tf. M$ig3 n;ie®co*ft. 

[0001] 

(lockstep) ttoyir-jvy- ■ bUyyb ■ nytj. 

XTMzmth* 

[0002] 

lft&<?>i$ffi] Ci^J;a^n-y^Xx-y7^c07=j-- 

A. 

[0003] SsftOWDH 1 ti. ftSW&i'Xx A<0 
HflSr^tHSKHT'S) 9 , 3 -oOlil tffiS (CPU) -fe 
•y M 0 , 11, 1 2i)K £iI<D7n-y ^ 1 60TT'|S| 
»(sync) LT»fH-|.o l-^cOMa-b-y Mi, ® 
Sxvi/y. fyUtf+iWISSE (CPU) ^flSMfcli 

<9S2fi. i^id^rKiS-fe-yhOffiB&llTftl.,, ZO 
mit. MSx^i^O, flasMBtlgM (>*U) 
22. fcitffl^X2 3£*LT^&. MH-t-y h 
li, 3yta-^ ■ ^XrA^^coftecoxu^yhSr-g- 

SrV^fctSrS. *f-gff«^. ^{f |^g|5Ax 1 3*^C0 

Stt 1 3 , D.y ? 1 6 fflcOA^J 1 5 , &1:W\- 

WxTm&X-AJ) 1 4 iRttWlTH*. 

[ 0 0 0 4 ] a 1 K^-fJ: 3 £, 3o*5«BH: .y h 1 



0. 11. 1 2<?DbtJ^3ti. )W:7M0. 11, 12 

omtmmz , mmmn-- (*-?) 1 7 1 

{£&S*ll>. »*fe-y M0. 11, 1 2jWESfc»fe 

-*17li, JSg-fe-yMO, 11, 12*^Atil7J 
(I/O) H?-y^XrA18^3vyHSrfflJg§^-. Rj 
LA-L, #KH-b-yh*^t7)ai^^=3rS^ 
£fi, ^iM#>*^a3T'*6£fc£^U 
li. I/Oilft(c|«i*4iSUt. flfe*>OWE«l 

[0005] il*ti, MjBSBttt. *-**W3$r5 
4 V 1 4 S:^t-CK*^*-t5!M-b -y Hcfl^£{£& 
L, PWFOftSSaa-feyhSBIfe-t rgc3EIEj3K(BI 

-fe-y Mix4 yf- • *7t<t0, R^THf^Htt. dit 

0^ L^MT'Ji , 1 -y h tf~mm t tdi&til 

{is *»<0*4«HI-fe«yh^»tttcJ:->Ta«fc:Wn 
[0006] L*^L. ^B^idOl^^^XrAtK 

^. **3!H. HRK, 0.y^X7-.y7^-C»fW-S 
[0007] O>y^Xf-yT^C0^XTA(i. ?t- 

f-^ffisiwtiit. mmx'hhzttfmt 

%iXh.-mL^^^v^-Wi^cr>^x^uzm^ 
^^'JJi. #^(C7V:isv\ fifH^lf^AM hco&mz* 
[0008] n'^XfvTMOTyti-? • v-x 

•fe-yho^^'jto^cov^TMtaoTfe^, mtt 

^<7)^Xf-M,i)>^(0±X^i.^^ t )^ , *LV^a-fe-y 
[0009] jJKfefc LT, IW««ffivi|iiSllif*i4 < 
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■WKIMBrf « i fc . HJWtWlWHs -y h Sr^c 

*. Hit*. ^fcH-f^hfcJoTSIfciBii&fii^r 
ffitt^ftiHW^tU+^V^h • t — 9 • x?-ti. 

^a«^»fross5L*5i#ie«i-rnrent*«*i,*<. zti 
a&mmzwi-tiz t & . l*>u cna. & 

* , mm\*fwm*. ? biza e-t- s i t t* * . t 

[00 10] 
[00 11] 

§ i t tfT-z m i iEii««t . >j>%< t m^titzm. 
2mmmb. mj^ybvmsizmitmmmm 

irti vwk&mt tuts. 

[0012] *Mtommmt* o» nf^rM 

•c&s £ t Srfijffl-rs . m&M-rtmnmt <rbt turn 

^ittSSWROn^-j/ g yco^A^jE^ixtv^l. £ b 
IWI«tfllffSllfc^4Ko;fcT3F--R (divergence) 

«>m tJ«t^SllEii«fll*>*«l6'fk§^5U[fflr<Offi« 

y ssk ^> btm &-r & £ t *«rtgfc&§ . 

[0 0 1 3] **»JJBIf (*&*) OfSHSte, T^-feX 

&rt\.wiztmtL\\ mwrtui. ftifci 
tz)*.y*w i-t^h*>^- : J) zumtizbtftf- 



[0014] xmnmrnmizxtM , fl^icAfe* 

■WW-*«R0>R*l«I-fe -y h k . RWWttfW <y h 
Sr^iajL. RJW«*.fl|^t4j«t*IWHIi-«ittaMI 
kSr-S*??*- /Vh • hU5Vh • 3>ta-^ • 

nx^< t y sa^'f <y h nm?& z t 1 
tmmm b.'j?%<th mbtuzmnx * y ss,^ ^ 
y h Sri en-ti. mnsmzmi tim2 tmmm b . n 

m i tmwmimatthwmx-n b . 'j?%<thmi 
axvw, 2 mmmvmiiztitemnx* u * , raw 
limjaa-fe -y h +TH»t^r*> * y utt^mih * 

[0015] *«W)«K0JDWfcJ:iUjr, mibtKv^ 

owa-b-yhSrS^ts^a^titt^n. ;wt- 
ivb ■ Wyyv ■ ayv*-? ■ -/XfAlt m«lz 

^vm^-t^m^mmm^-y hb. mHOb* 
mmimm^z&j&-tmw®mbz$t!. z<vh 

mi. MhtitzfflmzbtoXX^VJ&MOhc?)- 

mmmm&mb. mmzmtx. tmmz 
b , nww)*s4Lfc«a-fe'!' b+t- . ^-*< t h-mx 

[0016] *»»O-JW|0B«*Ctt % [SlKBti^h.^^ 

yv^m, &%<bh : m%ixtzs.*v)'\<r>*v) 

- JKyb (Jtvm&M^yb) (Dim 

— 

[0017] *^£o-mt®«T-{i, m i entifli 
• ut . Wkk* * y • >-'<o-?-fi-f ntcstt s x >- h y 

k. miieH^«* { vStt€§^T'<->-'*<#ji^§fi^ 
!ttXt. •e^-^xvhyicSji^^itSn-h'k 
$r«li.7tRAM$:-ttJ^ : Eyf31a.^-y bX'ht. 

[ o o 1 8 ] ft i G*flM(i, nm ( HHWi-Wt) ft# 
bmzu\ 

[ o o 1 9 ] ft i tmrnfflz. nm-t «, b*(n^-is 
mx'^m.^^.<rmh^- : Jnmthzbi^ 

X'%h. 

[0 020] K2immM* »IKVh«l>TI 

tx\ Mo&co^vwsfi^^yb^a-vy^m^n 
ni-izbmtiw 



(6) 



itfPPPl 0-1 7 74 98 



[002 1]*2CitNMti> 5feA^$tajUN*7 7rSr 

-tttz i . i mmmmofs i feimufHi: , ^ 

®L*X'imL. 7YVX • f3-/li ftJJift&L* 
•y7rff)ta^izmm^tl-Z. Zff)^Ml^iHUiy7r 

timiz&fc lx. z o^- vmmm i imm® tz& 
-r. 

[0022] mt tx , $ 2smsm. msmm 

ZiStsZi:tfX'$&. Ztllz£*), 7t-)VV • YVy 

yh ■ ayta-^, mmtmwmmmtzmmm 
mtti(r>x\ mm&x h *mm-& ztttx* h . 
[0023] %2%mm<W£j)tfw. i immm^coA 
AfcfcfefcvMtaii. %2imm<r>®mK mar 
timizmixmtzti&zttfmuK 
[00 24 3 mazmmit. vyh^^nzx^x^. 

$5831 A 777 (foxy h y £H#rf s y7h->i7l: 

L B^tajfc^-^U X h ZffiltiT L B<7)|*I<glT*£> 
[0 0 2 5] mitftl*Jl£JB8LX..V7hVx7 

&*&&k LTV— ^ U «wc, *0Si->y 
S. 

[ o o 2 6 ] vmtt&omt* m i &tim2Bm 
tmx'WMztitz***) ■^-^mm-^th^o^m 

nmtvhZ&fmx'V&t&x?*, ovw^r 

*3*TMMM-4 3o*>RJ(B«yH: * >•£ -Sony tr* - 

[0028] znmsiz. nmtfwm* -y 
tcpi^i-mjaa-fe-y hfciu^yosiui-fe'y h'vofi 

o^JtftL&a^. TOi-m^-fe-yMsltfS 1 ) 
cojjia-fe >y h so o *> 1 -?£f?8c£ L , iiO^ffMV 7 h 
->x7 • u^S-fiifflLT, IslWJi-mMa-k-yhfcJ:^ 

soojag-b-yh(7)^*>to lofeint^riikcj: 

-iX&m-hzt&xth. 

[0029] m^mmt. d 7 ?xt 7tm^7* 



£"aa*R-Hh&»4>ff i mnzwmth z t 

Sffc&Kffig*. 3yti-^n tW^+^Sl-fe-y 

h nwmw&iztt i x m&iffiX't> i . ura^t, s 

[0030] 

[fKH03eSS<9S3B] *»fl««tBRBto^T. PttR 
o#JS»^ j^»o»afclWS-t 6 gsftcoHfflt;: Mil L 

[0031] m3it. MftzmLx*mxr>-mx.i< 

$fH&ffo&7*n-y? • yAT7vJ±X'h 

I. 0313. flRfcflll t,Z7F.-fisXTA>%f(7> 

7t-/Uh ■ hUjyh ■ nyta-^ • y^fAfflw 
[0032] 03T1i, *!UIxy><'y ( fiRIx If +^553 

ga(cpu) ) 2 0fcj:txi*iai*t^ie^s 

•J) 2 2tt, rtay^2 3-CfS«S*l.*. »l-WfiR. W 
731 5. fcit^A-H^xTSBiAA* 1 4 mv^ti 

[0033] H3ttt. vSiML? ^ ^ K»r>f 

h*fE»f4C:i:*«T-S4»lE«Wt2 5. fJ«flr<i: 

*^fflitfcm2ieM«i»2 6, fcctr/^<i:tmifc 

S > * U Sgt^ffiffl t i fc , WWtit. II 3 fcjjrf 
idC. «8^2 5. 2 6, 21^n^tl\,t. 

2 5tJj:tXS2IE^»flt2 6*«, • T^-fex • >f 

^yhSrK^LT, ^tyws *mz 
(o* mteftmm* y t y t— t 

h<m#$>&. titzL. I8flt2 5. 2 6. ££#2 7 

s^t* 5 T'§§. m«%mmmizit. a-h^xt* 

<tlXV7h»>xT<0«^^ffl^:*^^ S^/iV 

S^-&««(ia«JiV7h'7 
xTT-Hte^nSifc^i), X-?l7bm£LX^ 

vj.-f^x'nmtihztwbh. tn. mstm 
®2 5\±, *x2 3tzm&Mztvrtz* m2tmim 
26Zftixwmztihzttft>&. mmiv : m2ii 
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[0034] nyh-i-^ ■ isXT-Mi, mmt±^ 
ry-7-4 j fcowflU. V7h->x7ii, jt^vm 

i-7 h(7)|2ii4'^-vfcW-ri,r-T-f • t*7 h£ 

- ? $ ft l> J: ? (C-f •& CI i: I> . *osho \- H 
x7* { -eoh'v hSr 1 fclS^L, -eco^-^m^^ 

tiz%z>*immm\<\ 

[0035] i&kmitzffi&ztittytjL-fmm-t 
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BACKGROUND AND INTRODUCTION 

The invention relates generally to fault tolerant computer systems such as 
lockstep fault tolerant computers which use multiple subsystems that run identically. 

In such lockstep fault tolerant computer systems, the outputs of the subsystems 
are compared within the computer and, if the outputs differ, some exceptional repair 
action is taken. 

Figure 1 of the accompanying drawings is a schematic overview of an example 
of a typical system, in which three identical processing (CPU) sets 10, 11, 12 operate 
in synchronism (sync) under a common clock 16. By a processing set is meant a 
subsystem including a processing engine, for example a central processing unit 
(CPU), and internal state storage. Figure 2 of the accompanying drawings is a 
schematic representation of such a processing set. This shows a processing engine 
20, internal state storage (memory) 22 and an internal bus 23. The processing set 
may include other elements of a computer system, but will not normally include 
input/output interfaces. External connections are also provided, for example a 
connection 13* from the internal bus 13. an input 15 for the external clock 16 and 
hardware interrupt inputs 14. 

As shown in Figure 1, the outputs of the three processing sets 10, 11, 12 are 
supplied to a fault detector unit (voter) 17 to monitor the operation of the processing 
sets 10, il, 12. If the processors sets 10, 11, 12 are operating correctly, they 
produce identical" outputs to the voter 17. Accordingly, if the outputs match, the 
voter 17 passes commands from the processing sets 10, 11, 12 to an input/output 
(I/O) subsystem 18 for action. If, however, the outputs from the processing sets 
differ, this indicates that something is amiss, and the voter causes some corrective 
action to occur before acting upon an I/O operation. 

Typically, a corrective action includes the voter supplying a signal via the 
appropriate line 14 to a processing set showing a fault to cause a 'change me* light 
(not shown) to be illuminated on the faulty processing set. The defective processing 
set is switched off and an operator then has to replace it with a correctly functioning 
unit. In the example shown, a defective processing set can normally be easily 
identified by majority voting because of the rwo-to-one vote that will occur if one 
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processing set fails or develops a temporary or permanent fault. 

However, the invention is not limited to such systems, but is also applicable 
to systems where extensive diagnostic operations are needed to identify the faulty 
processing set. The system need not have a single voter, and need not vote merely 
I/O commands. The invention is generally applicable to synchronous systems with 
redundant components which run in tockstep. 

Lockstep systems depend on total synchronisation of the processing sets that 
make up the fault tolerant processing core. Accordingly, the processing sets need 
hardware which operates identically, and, in addition, the internal stored state of the 
data in the processing sets also needs to be identical. Pan of the process of 
integrating a new processing set into a running system involves copying the contents 
of the main memory of a running system to the new processing set. Because main 
memory can be very large, for example of the order of gigabytes, this process can 
take rather a long time in computing terms. 

Lockstep computer systems can go out of sync for various reasons. The prime 
reason is a failure of a single processing set in a permanent way. Recovery from 
such a failure normally involves removal of the failed unit, replacement with a 
functioning unit and reinstatement of the functioning unit. Clearly, the new 
processing set will have no notion of the contents of memory of a running processing 
set, and all of the main memory from the running system will have to be copied to 
the new processing set. 

Other, less traumatic out-of-sync events can often be diagnosed automatically 
by the running computer system and can lead to the automatic reintegration of the 
out-of-sync processing set without its replacement. For example, a soft data error in 
a dynamic memory, perhaps caused by a cosmic ray event, could cause a minor upset 
in operation that could be fixed automatically. However, this has still required the 
reinte gration of the mem ory state of the out-of-sync proce ssing set, that is the copying 
of the contents of the main memory from a running system to the out-of-sync 
processing set. Accordingly, because of the main memory can be very large, this can 
still take a long time in computing terms. 

The invention seeks to provide an automatic and rapid way of recovering from 
minor out-of-sync events which avoids the problems of the prior an. 
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SUMMARY OF THE INVENTION 

In accordance with one aspect of the invention, there is provided a memory 
management system for a fault tolerant computer system, the memory management 
system comprising: a first recording mechanism which can be activated to record 
memory update events; a second recording mechanism having a capacity to record at 
least a limited number of memory update events; a fault input for a fault signal to 
activate the first recording mechanism in the event of a fault event; and a memory 
reintegration mechanism to reintegrate at least pans of memory identified in the first 
and second recording mechanisms. 

Embodiments of the invention take advantage of the fact thai, after a minor 
out-of-sync event between processing sets in a loekstep system, most of the memory 
contents of the out-of-sync processing set is initially identical to that in a running 
system. Only a relatively small number of locations within the memory system of 
either the out-of-sync processing set or the running system will have been modified. 
However, the divergence will increase with time as the running system continues to 
operate and execute its normal processing load. Embodiments of the invention allow 
for the divergence to be tracked and accounted for and, moreover, for any memory 
update events around the out-of-sync event and before the first recording mechanism 
has been activated to be caught. 

Preferably, the recording of memory updates (writes) is not based on 
recording each address accessed, but rather on memory segments (pages) updated 
(wrinen to). In other words, the first and/or second recording mechanisms preferably 
record the segments (or pages) updated (written to). This can be done effectively 
using a segment (or page) memory with a bit per segment (page) for identifying the 
segments (pages) written to. 

In accordance with a nother a sp ect of the invention, there is provided a fault 

tolerant computer system comprising a plurality of synchronous processing sets, each 
comprising a processor with internal memory and operating in loekstep, and an out 
of sync detector for detecting an out-of sync-event and for generating an out-of-sync 
signal, wherein each processing set also comprises: a first recording mechanism 
which can be activated to record memory write events; a second recording mechanism 
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having a capacity to record at least a limited number of memory write events; a fault 
input for receiving the out-of-sync signal to activate the first recording mechanism in 
the event of an out-of-sync event; and a memory reintegration mechanism to 
reintegrate in an out-of sync processing set at least pans of memory identified in the 
first and second recording mechanisms. 

In accordance with a further aspect of the invention, there is provided a 
method for reintegration of a processing set of a fault tolerant computer system 
following a fault, wherein the fault tolerant computer system comprises a plurality of 
synchronous processing sets, each comprising a processor and internal memoiy and 
operating in lockstep, and a fault detector for detecting a fault event and for 
generating a fault signal, the method comprising: 

maintaining a temporary record of memory update events over a limited period; 
responding to a fault to activate a further record of memory update events following 
the fault state; and 

performing memory reintegration in a processing set in which a fault has occurred for 
at least those parts of memory identified in the temporary and farther memory 
records. 

In an embodiment of the invention a record is kept of at least selected memory 
access events (memory write events) to main memory after the out-of-sync event, so 
that only the modified memory locations need to be copied to reintegrate the out-of- 
sync processing set. 

In one embodiment of the invention, the first recording mechanism is a 
memoiy management unit comprising a RAM with an entry far each of a plurality of 
memory pages, a code being written to a page entry each time that page is written to 
when the first recording mechanism has been activated. 

Preferably, the first recording mechanism has an enable input connected to 
received the fault (out-of-sync) signal. 

The first recording mechanism can record an arbitrarily large number of 
written pages, up to the total number of pages in the processing unit. 

The second recording mechanism preferably maintains a rolling record of 
recent memory update events up to a number sufficient to cover the time to activate 
the first recording mechanism following a fault event. 
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The second recording mechanism can comprise a first-in-first out buffer, the 
first recording mechanism in one embodiment being connected to an output of the 
first-in-first-out buffer. In this configuration, the first-in-first-out buffer stores up 
to a predetermined number of update addresses, an address decoder can be connected 
to the output of the first-in-first-out buffer to generate a page signal representative of 
a memory update address output from the first-in-first-out buffer and the address 
decoder is responsive to the out-of sync signal to pass the page signal to the first 
recording mechanism. 

Alternatively, the second recording mechanism can comprise a logic analyzer. 
This can reduce implementation costs as fault tolerant computers typically include a 
logic analyzer for fault analysis. 

Where the output of the second recording mechanism does not form the input 
to the first recording mechanism, the operation of the second recording means is 
preferably inhibited in response to the out-of-sync signal. 

The first recording mechanism can comprise a software generated table in 
which a record corresponding to a page of memory is marked with a code whenever 
that page has been written. This record can be maintained by software which updates 
entries in the translation look-aside buffer of the processor. The second recording 
mechanism can be the contents of the TLB together with a list of pages recently 
flushed from the TLB. 

In response to an out-of-sync input, software can search the TLB and the list 
for pages which may recently have been written and may mark these as written in the 
first recording mechanism, then continue to maintain the first recording mechanism 
until the processing units are reintegrated. 

Preferably, the memory reintegration mechanism is operative to reintegrate 
memory pages identified in the first and second recording mechanisms. 

The invention is ap plicable to a computer system com prising three 

synchronous processing sets operating in lockstep, wherein an out-of-sync detector 
determines an out-of-sync processing set by majority voting. 

In this case, reintegration of an but-of-step processing set can be achieved by, 
in response to the identification of an out-of-sync processing set, selecting one of the 
remaining two processing sets, supplying an interrupt to the out-of-sync processing 
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set and the remaining processing set to cause the out-of-sync and remaining 
processing sets to idle, reintegrating one of the out-of-sync and remaining processing 
sets while maintaining a software log of memory write events, and then reintegrating 
the other of the out-of-sync and remaining processing sets using the software log. 

The invention described can reduce the reintegration time of a processing set 
in a lockstep fault tolerant computer from many minutes to just fractions of a second. 
During the reintegration period, the computer is vulnerable to a further failure in the 
running processing set. Thus the reduction in reintegration time has a significant 
benefit on the overall availability of the computer. 

DESCRIPTION OF THE DRAWINGS 

An embodiment of the invention will be described hereinafter with reference 
to the accompanying drawings in which like reference signs relate to like features and 
in which: 

Figure 1 is a schematic overview of a triple-modular-redundant fault tolerant 
computer system; 

Figure 2 is a schematic representation of elements of a processor set of the 
system of Figure 1; 

Figure 3 is a schematic representation of a processor set of an embodiment of 
the invention; 

Figure 4 is a schematic representation of a memory management unit; 
Figure 5 is a schematic representation of an example of a first recording 
mechanism; 

Figure 6 is a schematic representation of an example of a secondary recording 
mechanism; 

_ Figure 7 is^ a schematic repre sentati on of another example of a secondary 

recording mechanism; 

Figure 8 is a schematic representation of an example of a combined first and 
secondary recording mechanism; and 

Figure 9 is a schematic representation of an alternative configuration of the 
1 example of Figure 8. 
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DESCRIPTION OF THE PREFERRED EMBODIMENT 

Figure 3 is a schematic block diagram to represent elements of an example of 
the invention in functional terms. Figure 3 generally represents one of the processing 
sets 10/11/12 for a fault tolerant computer system such as, for example, the system 
shown in Figure 1. 

In Figure 3, a processing engine (e.g, a central processing unit (CPU)) 20 and 
internal state storage (memory) 22 are connected by an internal bus 23. External 
connections are also provided, for example a connection 13 from the internal bus 23, 
an input 15 for an external clock and hardware interrupt input 14. 

Also shown schematically in Figure 3 are a first recording mechanism 25 
which can be activated to record memory update events, a second recording 
mechanism 26 having a capacity to record at least a limited number of memory 
update events and a memory reintegration mechanism to reintegrate at least parts of 
memory identified in the first and second recording mechanisms. As shown in Figure 
3, each of the mechanisms 25, 26 and 27 is shown connected to the internal bus 23. 
This is because the first arid second recording mechanisms 25 and 26 need to monitor 
memory access events to identify when and where memory upjdating occurs (memory 
writes). Also the reintegration mechanism 27 needs to access the first and second 
recording mechanisms to determine where memory writes have occurred in the 
memories of out-of-sync and running processing sets and then to copy corresponding 
memory portions from the running to out-of-sync processor set memories. However, 
the mechanisms 25, 26 and 27 can be implemented in various ways as will be 
explained in the following description. Various implementations involve different 
combinations of hardware and software and interconnection of the various elements 
will typically differ from that illustrated in Figure 3. For example, the reintegration 
mechanism will typically be implemented in software, and may be implemented in a 

- control computer associated with and/or forming part of the voter 17. -Also, the first 

recording mechanism 25 may not be connected directly to the bus 23, but may be 
connected via the second recording mechanism 26. Also the first and second 
recording mechanisms may be implemented to a greater or lesser extent in software, 
as will be described hereinafter. 

Computer systems typically include memory management hardware to keep 
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track of and control the use of main memory. It is also usual to divide memory into 
pages of specified size and to keep a small record of access controls to each page. 
Hardware mechanisms also exist for updating a record for a page with a bit that 
indicates that the page has been modified. This bit is the so-called 'dirty' bit for a 
page. A page of memory is called 'clean' when no writes have been made to it to 
change it from its initial state, and •dirty* after such a write has been made. Software 
can cause a page to be marked 'clean' by clearing the dirty bit for that page in the 
memory management unit record. Hardware will later set the bit to I to indicate that 
the pagp has been written to. In normal operation, many pages of computer memory 
will be considered by the memory management unit to be dirty most of the time. 
Accordingly, if a conventional memory management unit operating in a conventional 
manner is provided in each of the processing sets of a lockstep fault tolerant computer 
system, it is thus likely that many pages will be marked dirty when an out-of-sync 
event occurs. 

Because the memory management unit of a conventionally configured 
computer processing set is usually under the control of the running operating system, 
in a first embodiment of the invention an additional memory management unit is 
provided purely for the use of the software which reintegrates processing sets after 
an out-of-sync event. 

Figure 4 shows a conventional memory management unit 40 which has been 
customised to include only information on which pages of memory are dirty and 
which are clean. In the following description this type of memory management unit 
is termed a 'dirty ram'. Software 42 may access the dirty ram storage 46 to check 
which pages are dirty, and can write it directly to change the status of a page to dirty 
or clean. In addition, hardware 44 automatically changes to 'dirty 1 the state of the 
record for any page of main memory which is written to via the bus 23. In this 
embodiment_o_nly one bitof din^ ram storage46 is used for each^entire page of main 
memory. It is not necessary that the size of the 'pages' monitored by the dirty ram 
is the same as the size used by other mernory management units in the system, but 
it is often both convenient and efficient that the pages all have the same size. 
Computers tend to work in pages and a write access to one part of a page often 
implies that other locations within the same page will also be written. However, a 
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conventional memory management unit as shown in Figure 3 will not in itself be 
sufficient to^mplanent the task m band because most of Che pages axe usually dirty 
as described above. 

Figuze 5 is a FrKr rpt*'- block diagram of a dirty nun 50 for a first 
rmhnHim^t of the invention. In Figure 5, the dirty ram 50 is provided with a 
separate enable input 58 whereby the hardware 54 only begins to log dirty pages in 
the dirty ram storage 56 after the processing sets have gone out of sync. The signal 
on the enable input is asserted in response to the detection by the voter 17 of an out- 
of-sync event. 

The dirty ram enable input 58 allows the operation of the dirty ram system. 
In normal operation, with processing sets running in sync, the dirty ram enable input 
is not asserted and the dirty ram 50 is set by the software 52 such that all pages are 
given •clean* status. 

When an out-of-sync event occurs, the enable input 58 becomes asserted. 
While the enable input is asserted (i.e. . while me processing sets are out of sync) . the 
dirty ram logs the pages of main memory written to. The pages which are written 
to will be those which potentially differ on the running and the out-of-sync processing 
sets. A dirty ram with an enable input as in Figure 5 is provided in each processing 
set and is connected there to the respective system bus 23. While the processing sets 
are running in sync, each dirty ram is held in the clean state by the software 52. 
When it is detected that the processing sets arc running out of sync the dirty ram 
logging is enabled. In this embodiment, a hardware enable signal 58 is generated in 
the out-of-sync detection hardware (i.e., the voter 17) on detecting thai the processing 
sets 10, 11, 12 are out-of-sync. In other words, whenever the voter detects a 
difference in the output from the processing sets, it generates a signal which is 
supplied to each processing set to form the asserted out-of-sync signal. Once 
asserted, the out-of-sync signal is not negated until the processing sets have been 
reinstated. In other embodiments, the enable signal could be generated by software. 
After an out-of-sync event, software and/or hardware mechanisms act to re- 

— configure the fault tolerant computer system. The system carries on jnr^ normal 

operations with at least one processing set. At least one processing set. including the 
out-of-sync processing set is taken out of operation. This out-of-sync processing set 
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thereby stops running normal operations and waits to be reintegrated into the tunning 
system. Memory writes done on the running and the out-of-sync processing set 
produce divergence in the main memory contents in the running and out-of-sync 
processing sets. 

When software on the running system comes to reintegrate the out-of-sync 
processing set, it accesses the dirty ram on the running system to find pages of 
memory that have been dirtied since the out-of-sync event* It also accesses the dirty 
ram in me out-of-sync processing set. This dirty ram tells which pages have been 
modified by the out-of-sync processor(s) since the divergence began. If a page of 
memory is mentioned as dirty in any of the dirty rams, on the running and out-of- 
sync processing sets, it has to be copied by the reintegration software to bring the 
out-of-sync processing sets back into sync. If a page of memory is not marked as 
dirty in any dirty ram, it can be ignored, as it will still be correct on the out-of-sync 
processing set. 

In an alternative embodiment of the invention, if the processing sets have a 
dirty ram store with no enable pin, operating all the time to log dirty pages, software 
could be activated by a hardware signal on the out-of-sync event to clean out the dirty 
ram. This software must carefully note any pages which it itself dirties during the 
cleaning process. 

In yet another alternative embodiment, an ordinary memory management unit 
can also be used to collect the dirty page information. In this alternative 
embodiment, software is arranged to modify the page tables at the out-of-sync event 
so that all pages of main memory are write protected. This means that write cycles 
to memory will result in a bus error exception to the processor. The processor can 
then act on each bus error first to add the written page to a software-maintained list 
of dirty pages, then to remove the write protection for that page so that future writes 
there will complete normally. Thishas the advantage that only a singleJi$t_ofdiity_ 
pages need be examined by the reintegration software, with no searching through 
clean pages to look for occasional dirty ones. 

It should be noted, however, that it is desirable to provide a separate 'dirty 
memory* rather than to use using conventional memory management emits with 
additional software to collect dirty page information. This is because the use of 
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conventional memory management units suffers from twin disadvantages. Firsdy, the 
conventional computer operating system software may be using the memory 
management unit for its own purposes. Secondly, conventional memory management 
units work for only a single processor and cannot cover multi-processor operation or 
direct memory access by I/O devices. 

Whatever method is used to collect the data on dirty memory pages, there are 
likely to be problems near the out-of-sync event. Some time has to elapse between 
the detection of the out-of-sync event and the enabling of the dirty ram data 
collection, and a few dirty pages may go unrecorded in this period. Exactly how 
many pages are missed depends on the implementation of the dirty ram, but even a 
single missed page is enough to make useless the scheme of copying only some, not 
all, of main memory after an out-of-sync event. 

Accordingly, in embodiments of the invention, a mechanism is also provided 
for recording memory write events around the out-of-sync event. 

The mechanisms described above for providing a record of dirty pages are put 
into operation starting at the out-of-sync event, and can record all the pages required 
following that event. However, to complement this a separate, temporary, record is 
required for pages dirtied close to the out-of-sync event. This separate record has to 
take account of write events over a limited time, preferably on a rolling basis. This 
separate record needs to have a capacity sufficient to accommodate write events which 
might occur between the out-of sync event itself and the time the mechanisms 
described above can start recording. This separate record is called the secondary 
dirty page record in the following description, to distinguish it from the 'dirty ram' 
already discussed above. 

The secondary dirty page record has to be operating continually (at least until 
an out-of-sync event occurs), because it cannot be predicted when a fault will send 
— it out-of-sync. -It-b-me-job-of^rJie_secondary_dirty_paja[e reconl to remember those_ 
pages, dirtied just before or after an out-of-sync event, that may not be properly 
collected by the dirty ram. The secondary dirty page record also has to have limited 
time memory. If it remembered pages dirtied indefinitely far in the past, it would 
eventually list all memory pages as being dirty. It should only remember far enough 
back past the out-of-sync event to ensure that divergently dirtied pages which the 
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primary dirty page store cannot catch are included. 

In some embodiments described below, where the secondary dirty page record 
operates in parallel with the dirty ram, the secondary page record is frozen at or soon 
after the out-of-sync event. In these embodiments, if it were left running, the limited- 
time nature of the record could eventually allow the important information about the 
out-of sync event to be overwritten or lost. This can conveniently be done by 
responding to the asserting of the diny ram enable signal to inhibit the operation of 
the secondary page record. 

Once the secondary dirty page record has been frozen, either software or 
hardware can examine it and cause dirty pages listed there to be added to the primary 
dirty ram r or to a separate list for copying by the reintegration software. Both out-of- 
sync and running processing sets have secondary dirty page records. Software can 
examine and compare the records and deduce which pages were actually dirtied in 
sync by the processing sets, if needed. This will decrease the number of pages to be 
copied. 

In one embodiment a logic analyzer is used to collect information for the 
secondary dirty page record. Figure 6 shows a logic analyzer 60 observing the bus 
23 of a processing set. A logic analyzer 60 with a trigger mechanism 66, clock 
qualifier 62 and address generator 64, is provided for each processing set. The logic 
analyzer is usually running. The analyzer 60 is triggered causing data collection to 
stop, by the assertion of the processing set out-of-sync signal, which same signal 
starts the primary dirty ram collecting data. The logic analyzer eventually stops 
operating and keeps a record of computer bus operation both before and after the out- 
of-sync event. By analysing the logic analyzer traces from the out-of-sync and 
running processing sets after an out-of-sync event, it is possible to deduce which of 
the stored transactions is a divergent write cycle. The relevant pages can then be 
added to a set of pages written to in a software-maintained secondary dirty page 

record. The logic aMlyaernecoTto store at" Feast tte address arid control information 

for each bus cycle so that pages written to can be determined. The analysis of the 
logic analyzer outputs can readily be effected by software routines. 

The advantage of using a logic analyzer for the secondary dirty page record 
is that lockstep fault tolerant computers will typically have logic analyzers built in, 
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and triggered on the out-of-sync event, for fault diagnosis. It is then merely 
necessary to provide the software to control and analyze the output of the logic 
analyzers as described above. 

As an alternative, a write buffer can provide storage for the secondary dirty 
page record. Figure 7 shows a first-in-first-out memory 70 used as a short-term 
buffer for writes to main memory over the internal bus 23. In normal in-sync 
operation, writes to main memory are decoded by write decode logic 71 and the page 
number of each write is written into the FIFO 70. When the out-of-sync occurs, the 
hardware out-of-sync detection signal 58 inhibits further writes into the FIFO 70. 
Later, software can examine the FIFO 70 contents to add pages to the dirty page list. 
The advantage of the write buffer for a secondary dirty page record is that it is 
simpler in both software and hardware than a logic analyzer. 

In further alternatives, the write buffer can be arranged in series with the dirty 

ram. 

Figure 8 illustrates a first example of a combination of a first and secondary 
recording mechanism, where a write buffer is arranged in series with a dirty ram. 
The arrangement of Figure 8 is based on a combination of the arrangements of 
Figures 4 and 7. In this case, a FIFO buffer 80 is operating continually to record 
write events to memory for a processing set with the write events being decoded 
continually by the write decode logic 81 to supply page addresses for storage in the 
FIFO 80. In the present case it is not necessary for the write decode logic to receive 
an inhibit input as will be described later. Page addresses supplied to the FIFO 80 
appear, after a delay, at the output of the FIFO 80. However, they are prevented 
from being passed to the dirty ram storage 86 by means of the gate 84 until the gate 
is enabled by an out-of-sync signal on the line 58. This out-of sync signal effectively 
provides an enable signal for the dirty Tarn as it then enables the page addresses from 
the_FIFO 80_to besupplied tothe <tirty_ram suorage86, whereby ^appropriate page bits^ 
can be set. Software 82 can be used to clear the dirty ram storage 86 at any time 
prior to an out-of-sync event so that it is •clean 1 when the out-of-sync signal is 
supplied. 

In this embodiment, it is not necessary to disable the FIFO buffer 80, as the 
contents of the FIFO buffer 80 are automatically stored in the dirty ram after a time 
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dependent on Che size of the FIFO 80 and the frequency of the write events, to this 
embodiment the reintegration mechanism preferably takes account of the dirty mm 
86 and the FIFO 80. 

Figure 9 illustrates a second example of a combination of a first and secondary 
recording mechanism, where a write butter is arranged in scries with a dirty ram. 
In this example, a FIFO buffer 90 is operating continually to record write events to 
memory for a processing sec The output from the FIFO buffer 90 is supplied to an 
address decoder 91 which is only enabled in response to the out-of-sync enable signal 
58. When the address decoder 91 is not enabled, the output of the FIFO buffer is 
effectively discarded. Only when the address decoder is enabled are dirty page bits 
output from the address decoder 91 for storage in the appropriate page location in the 
dirty ram storage 96. 

Optionally, the out-of-sync enable signal is also supplied to the dirty ram 
itself, although it will be appreciated that supplying the out-of-sync signal to the 
address decoder effectively provides an enable signal for the dirty ram. As for the 
Figure 8 example, in this embodiment it is not necessary to disable the FIFO buffer 
90. as the contents of the FIFO buffer 90 are automatically stored in the dirty ram 
after a time dependent on the size of the FIFO 90 and the frequency of the write 
events. In this embodiment the reintegration mechanism preferably takes account of 
the dirty ram 96 and the FIFO 90. 

A further, software-implemented embodiment makes use of a table look-aside 
buffer (TLB) and an associated TLB miss routine, plus a dirty page store which is 
created in main memory. A TLB forms a standard part of most computer addressing 
schemes using paged memory. Same computers maintain TLB entries with a 
software 113 miss routine, instead of fixed hardware. In this embodiment, and 
following a fault event, software can note the TLB entries currently specifying 
writable pages, and add those to the soft dirty page store. Software can also transfer 
to this a list of writable pages recently flushed from the TLB. This list is maintained 
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by the miss routine in normal circumstances, before the fault event, and indicates 
pages which might have been written close to the fault event and immediately flushed 
from the TLB. Following this, while reintegration is in progress, software in the 
TLB miss routine adds each page written to the soft dirty page store. In this way, 
a record of dirty pages can be created. 

The approaches described above can be applied in a triple-modular-rcdundaiit 
(TMR) system. Some TMR lockstep systems switch to running with a single 
processing set when an out-of-sync event occurs. This requires two separate 
reintegration phases to recover. 

An example will be described where a TMR system is running with processing 
sets 10, 11 and 12 in sync. In this example, reintegration is per for med by software 
under the control of a control computer forming part of the voter 17. 

For this example, it is assumed here that processing set 12 suffers a dram soft 
error which takes the system out of sync. The voter detects the out-of-sync event and 
arbitrarily chooses processing set 10 to carry on and idles processing sets 1 1 and 12. 
Each of processing sets 10, 11 and 12 has us own primary dirty ram and secondary 
dirty page record, all capturing the differentially dirtied data since the out-of-sync 
event. 

To reintegrate processing set I i , all of the pages mentioned as dirty in 
processing set 10 dirty ram. processing set 10 secondary dirty page record, processing 
set 11 dirty ram and processing set 11 secondary dirty page record are copied from 
processing set 10 to processing set ll. 

Then, to reintegrate processing set 12, all the pages mentioned as dirty in 
processing set 10 dirty ram, processing set 10 secondary dirty page record , processing 
set 12 dirty ram and processing set 12 secondary dirty page record are copied from 
processing set 10 to processing set 12. 

During the reintegration of processing set 11, the processing set 10 has to 
continue to record the dirtying of pages. If the processing set 10 dirty page ram is 
operative during some part of this process, software must maintain a separate list 
of pages being dirtied to add to the list of pages copied to processing set 12. 

There has been described, therefore, embodhnents of the invention in which 
there are provided: a primary dirty ram, for example a dedica t ed memory 
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management unit or a conventional memory management unit with control 
mechanisms; a secondary dirty page record which records a limited number of write 
events to memory around the out-of-sync event, sufficient to capture all divergent 
writes until the primary dirty ram is enabled; and a mechanism, either hardware or 
software, to start the primary dirry ram recording the dirtying of pages shortly after 
an out-of-sync event, and where appropriate to stop the secondary dirty page record. 
By examining the primary dirty ram and secondary dirty page record of two 
processing sets and copying the pages of memory mentioned as dirty in any dirty page 
record to the out-of-sync processing set the system can be reinstated in an efficient 
manner. 

Although particular embodiments of the invention have been described herein, 
it will be appreciated that many modifications and/or additions may be made within 
the spirit and scope of the present invention. 

For example, various combinations of the first and second recording 
mechanisms described above may be provided. Also the various elements and 
techniques described above may be implemented using any appropriate hardware or 
software technology. 

Although a specific example of a TMR system has been described, the 
invention is not limited thereto. Moreover, other methods than majority voting can 
be used to identify an out-of-sync processing set. 

Although particular embodiments of the inventions have been described, it will 
be appreciated that the invention is not limited thereto, and many modifications and/or 
additions may be made within the spirit and scope of the invention as defined in the 
appended Claims. For example, different combinations of the features of the 
dependent Claims may be combined with the features of the independent Claims. 
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WHAT I CLAIM I?: 

1. A memory management system for a fault tolerant computer system, said 
memory management system comprising: 

a first recording mechanism which can be activated to record memory update events; 
a second recording mechanism having a capacity to record at least a limited number 
of memory update events; 

a fault input for a fault signal to activate said first recording mechanism in the event 
of a fault event; and 

a memory reintegration mechanism to reintegrate at least pans of memory identified 
in said first and second recording mechanisms. 

2. A memory management system according to Claim 1, wherein said first 
recording mechanism is a memory management unit comprising storage having an 
entry for each of a plurality of memory pages, a code being written to a page entry 
each time that page is written to when said first recording mechanism has been 
activated. 

3. A memory management system according to Claim 2, wherein said first 
recording mechanism has an enable input connected to receive the fault signal for 
activating said first recording mechanism. 

4. A memory management system according to Claim 1, wherein said second 
recording mechanism maintains a record of recent memory update events up to a 
number sufficient to cover the time to activate said first recording mechanism 
following a fault event. 

5: A-memory management system according-to Claim 4,-whcrcin said second 

recording mechanism comprises a first-in-first-out buffer. 

6. A memory management system according to Claim 5, wherein said first 
recording mechanism is connected to an output of said first- in-first-out buffer. 
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7. A memory management system according to Claim 6, wherein said first-in- 
first-out buffer stores up to a predetermined number of update addresses, an address 
decoder is connected to said output of said first-in-first-out buffer to generate a page 
signal representative of a memory update address output from said first-in-first-out 
buffer and said address decoder is responsive to said fault signal to pass said page 
signal to said first recording mechanism. 

8. A memory management system according to Claim 1, wherein said second 
recording mechanism comprises a logic analyzer. 

9. A memory management system according to Claim 1, wherein said second 
recording mechanism maintains a record of recent memory update events up to a 
number sufficient to cover the time to activate said first recording mechanism 
following a fault event, the operation of said second recording means being inhibited 
in response to said fault signal. 

10. A memory management system according to Claim 9, wherein said first 
recording mechanism comprises a software generated list of update events. 

11. A memory management system according to Claim 1, wherein said second 
recording mechanism comprises a table look-aside buffer and a memory access table 
maintained in main memory. 

12. A memory management system according to Claim 1, wherein said memory 
reintegration mechanism is operative to reintegrate memory pages identified in said 
first and second recording mechanisms. 



13. A fault tolerant computer system comprising a plurality of synchronous 
processing sets, each comprising a processor and internal memory and operating in 
lockstep, and an out-of-sync detector for detecting an out-of-sync event and for 
generating an out-of-sync signal, wherein each processing set includes a memory 
management system according to Claim 1. 
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14. A fault tolerant computer system comprising a plurality of synchronous 
processing sets, each comprising a processor and internal memory and operating in 
lockstep, and an out-of-sync detector for detecting an out-of-sync event and for 
generating an out-of-sync signal, wherein each processing set also comprises: 
a first recording mechanism which can be activated to record memory write events; 
a second recording mechanism having a capacity to record at least a limited number 
of memory write events; 

a fault input for receiving said out-of-sync signal to activate said first recording 
mechanism in the event of an out-of-sync event; and 

a memory reintegration mechanism to reintegrate in an out-of-sync processing set ai 
least parts of memory identified in said first and second recording mechanisms. 

IS. A fault tolerant computer system according to Claim 14, wherein said first 
recording mechanism is a memory management unit comprising a RAM with an entry 
for each of a plurality of memory pages, a code being written to a page entry each 
time that page is written to when said first recording mechanism has been activated. 

.16. A fault tolerant computer system according to Claim 15, wherein said first 
recording mechanism has an enable input connected to receive the out-of-sync signal 
for activating said first recording mechanism. 

17, A fault tolerant computer system according to Claim 14, wherein said second 
recording mechanism maintains a record of recent memory update events up to a 
number sufficient to cover the time to activate said first recording mec h a n i s m 
following an out-of-sync event. 

18. A fault tolerant computer system according to Claim 17, wherein said second 
recording mechanism comprises a first- in-first-out buffer. 

19; A fault tolerant computer- system-according to Claim 18, wherein^said first 

rec o rdin g mechanism is connected to an output of said first- in-frrsi-out buffer. 
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20. A fault tolerant computer system according to Claim 19, wherein said first-in- 
first-out buffer stores up to a predetermined number of update addresses, an address 
decoder is connected to said output of said first-in-first-out buffer to generate a page 
signal representative of a memory update address output from said first-in-first-out 
buffer and said address decoder is responsive to said out-of-sync signal to pass said 
page signal to said first recording mechanism. 

21 . A fault tolerant computer system according to Claim 14, wherein said second 
recording mechanism comprises a logic analyzer. 

22. A fault tolerant computer system according to Claim 14, wherein said second 
recording mechanism maintains a record of recent memory update events up to a 
number sufficient to cover the time to activate said first recording mechanism 
following an out-of-sync event, the operation of said second recording means being 
inhibited in response to said out-of-sync signal. 

23. A fault tolerant computer system according to Claim 22, wherein said first 
recording mechanism comprises a software generated list of update events. 

24. A fault tolerant computer system according to Claim 14, wherein said second 
recording mechanism comprises a table look-aside buffer and a memory access table 
maintained in main memory. 

25 . A fault tolerant computer system according to Claim 14, wherein said memory 
reintegration mechanism is operative to reintegrate memory pages identified in said 
first and second recording mechanisms. 



26. A fault tolerant computer system according to Claim 14, comprising three 
synchronous processing sets operating in lockstep, wherein said out-of-sync detector 
determines an out-of-sync processing set by majority voting. 

27. A fault tolerant computer system according to Claim 26, wherein said out-of- 
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sync detector is arranged to select one of the remaining two processing sets, to supply 
an input to the out-of-sync processing set and the remaining processing set to cause 
said out-of-sync and remaining processing sets to idle, to reintegrate one of said out- 
of-sync and remaining processing sets while maintaining a software log of memory 
write events, and then to reintegrate said other of said out-of-sync and remaining 
processing sets using said software log . 

28. A method for reintegration of a processing set of a fault tolerant computer 
system comprising a plurality of synchronous processing sets, each comprising a 
processor and internal memory and operating in lockstep, and a fault detector for 
detecting a fault event and for generating a fault signal, said method comprising: 
maintaining a temporary record of memory update events over a limited period; 
responding to said fault signal to activate a further record of memory update events 
following said fault state; and 

performing memory reintegration in a processor in which a fault has occurred for at 
least those parts of memory identified in said temporary and further memory records. 

29. A method according to Claim 28, wherein said fault event is an out-of-sync 
event. 

30. A method according to Claim 28, wherein the further record is stored in a 
storage of a memory management unit, a page entry in said storage being provided 
for each page of memory, a code being written to a page entry each time that page 
is written to when said first recording mechanism has been activated. 

31. A method according to Claim 28, comprising maintaining a record of recent 
memory update events up to a number sufficient to cover the time to activate said 

—further record. 

32. A method according to Claim 31, wherein said temporary record is stored in 
a first-in-first-out buffer. 



21 



(36) 



^¥ 10-1 77498 



33. A method according to Claim 32 , comprising connecting a recording 
mechanism for the further recording to an output of said first-in-first-out buffer. 

34. A method according to Claim 33, comprising the steps of storing up to a 
predetermined number of update addresses in said first-in-fiist-out buffer, supplying 
the output of said first- in- first-out buffer to an address decoder to generate a page 
signal representative of a memory update address output from said first-in-first-out 
buffer, and recording said page signal as part of said further recording when said 
fault signal is active. 

35. A method according to Claim 31, wherein said temporary record is stored in 
a logic analyzer. 

36. A method according to Claim 31, comprising the steps of maintaining said 
temporary record of recent memory update events up to a number sufficient to cover 
the time to activate said first recording mechanism following a fault, said temporary 
record being inhibited in response to said fault signal. 

37. A method according to Claim 36, comprising generating a list of update events 
by software. 

38. A method according to Claim 3 1 , comprising forming said temporary record 
by means of a table look-aside buffer and a memory access table maintained in main 
memory. 

39. A method according to Claim 31, comprising the step of reintegrating memory 
pagesjdentified injaid temporary and further records. 

40. A method according to Claim 31; comprising three synchronous processing 
sets operating in lockstep, wherein an dut-of-sync detector determines an out-of-sync 
processing set by majority voting. 
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41. A method accxnding to Claim 31, comprising the steps of, in response to the 
identification of an out-of-sync processing set, selecting one of the remaining two 
processing sets, supplying an Interrupt to the oul-of-sync processing set and the 
remaining processing set to cause said out-of-sync and re maining processing sets to 
idle, reintegrating one of said out-of-sync and remaining processing sets while 
maintaining a software log of memory write events, and then integrating said other 
of said out-of-sync and remaining processing sets using said software log. 
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ABSTRACT OF THE D1SCTO.STIRF 

A memory management system for a fault tolerant computer system includes 
a fust recording mechanism (25) which can be activated to record memory update 
(write) events, a second recording mechanism (26) having a capacity to record at least 
a limited number of memory update events, a fault input for a fault signal to activate 
the first recording mechanism in the event of a fault (out-of-sync) event and a 
memory reintegration mechanism (27) to reintegrate at least parts of memory 
identified in the first and second recording mechanisms. The recording of memory 
updates is preferably done on a page basis using a dirty ram and secondary dirty page 
record. Recovery from a minor out-of-sync event between processing sets in a 
lockstep system can be achieved rapidly and efficiently by copying memory pages 
identified in the first and second recording mechanisms from a nmning to an out-of- 
sync processing set as only a relatively small number of locations within the memory 
system of either the out-of-sync or the running processing set system will have been 
modified. 



